This is a self-contained data analysis report for REC-22-436_REST_ECO_MANUSCRIPT_REV1.
The main hypothesis of this study were evaluated in consideration of daytime pumping station and artificial habitat use, which were
Roach will occupy pumping station when artificial habitat is absent
When introduced (pre-exclusion), roach will show equal preference towards artificial habitat and pumping station
When excluded from the pumping station, artificial habitat occupancy will increase
Roach will show a preferential change (compared to pre-exclusion) to occupying artificial habitat when the pumping station is reintroduced (post-exclusion)
Artificial habitat occupancy will be higher in sheltered (vs unsheltered) treatments
library(tidyverse)
library(knitr)
library(kableExtra)
library(funModeling)
library(dlookr)
library(glmmTMB)
library(ggeffects)
library(DHARMa)
library(ggpubr)
roach_wide.csv - wide data set where fish counts are stored in columns c_hab, c_open, c_ps
roach_wide=read_csv("./data/roach_wide.csv")
Determine dataset structure by checking number of rows, columns and data types (i.e., numerical, factors). Identify categorical factors and their levels.
nrow(roach_wide)
## [1] 6942
ncol(roach_wide)
## [1] 24
colnames(roach_wide)
## [1] "date" "time" "run" "trial"
## [5] "treatment" "ps_tank_end" "room_end" "tank"
## [9] "light" "sequence" "day" "seq_day"
## [13] "hab_avail" "ps_avail" "both_avail" "hours_havail"
## [17] "hours_lout" "c_hab" "c_open_screen" "c_open_cent"
## [21] "c_open_hab" "c_open" "c_ps" "c_both"
funModeling:df_status - For each variable it returns: Quantity and percentage of zeros (q_zeros and p_zeros respectively). Same metrics for NA values (q_NA/p_na), and infinite values (q_inf/p_inf). Last two columns indicates data type and quantity of unique values.
roach_wide_status<- funModeling::df_status(roach_wide, print_results = F)
| variable | q_zeros | p_zeros | q_na | p_na | q_inf | p_inf | type | unique |
|---|---|---|---|---|---|---|---|---|
| date | 0 | 0.00 | 0 | 0.00 | 0 | 0 | character | 52 |
| time | 288 | 4.15 | 0 | 0.00 | 0 | 0 | hms-difftime | 24 |
| run | 0 | 0.00 | 0 | 0.00 | 0 | 0 | numeric | 4 |
| trial | 0 | 0.00 | 0 | 0.00 | 0 | 0 | numeric | 24 |
| treatment | 3471 | 50.00 | 0 | 0.00 | 0 | 0 | numeric | 2 |
| ps_tank_end | 3468 | 49.96 | 0 | 0.00 | 0 | 0 | numeric | 2 |
| room_end | 3471 | 50.00 | 0 | 0.00 | 0 | 0 | numeric | 2 |
| tank | 0 | 0.00 | 0 | 0.00 | 0 | 0 | numeric | 6 |
| light | 2616 | 37.68 | 0 | 0.00 | 0 | 0 | numeric | 2 |
| sequence | 0 | 0.00 | 0 | 0.00 | 0 | 0 | numeric | 4 |
| day | 0 | 0.00 | 0 | 0.00 | 0 | 0 | numeric | 13 |
| seq_day | 0 | 0.00 | 0 | 0.00 | 0 | 0 | numeric | 52 |
| hab_avail | 5214 | 75.11 | 0 | 0.00 | 0 | 0 | numeric | 2 |
| ps_avail | 5166 | 74.42 | 0 | 0.00 | 0 | 0 | numeric | 2 |
| both_avail | 3504 | 50.48 | 0 | 0.00 | 0 | 0 | numeric | 2 |
| hours_havail | 0 | 0.00 | 3516 | 50.65 | 0 | 0 | numeric | 72 |
| hours_lout | 288 | 4.15 | 2334 | 33.62 | 0 | 0 | numeric | 16 |
| c_hab | 3201 | 46.11 | 0 | 0.00 | 0 | 0 | numeric | 13 |
| c_open_screen | 5683 | 81.86 | 0 | 0.00 | 0 | 0 | numeric | 13 |
| c_open_cent | 5331 | 76.79 | 0 | 0.00 | 0 | 0 | numeric | 13 |
| c_open_hab | 4510 | 64.97 | 0 | 0.00 | 0 | 0 | numeric | 13 |
| c_open | 3732 | 53.76 | 0 | 0.00 | 0 | 0 | numeric | 13 |
| c_ps | 3724 | 53.64 | 0 | 0.00 | 0 | 0 | numeric | 13 |
| c_both | 599 | 8.63 | 0 | 0.00 | 0 | 0 | numeric | 13 |
Several variables need to be converted to factors and labels added to levels.
Create a lookup table for variable labels.
labels_table <- list(
treatment = c('Covered (B)','Uncovered (A)'),
ps_tank_end = c('RH', 'LH'),
room_end = c('Far', 'Near'),
light = c('Day', 'Night'),
hab_avail = c('AH unavailable', 'AH available'),
ps_avail = c('PS available','PS unavailable'),
both_avail = c('Single Available', 'Both Available'),
sequence = c('Baseline', 'I 1', 'I 2', 'I 3')
)
Convert variables to factors with specific labeling using a loop. Use the lookup table to get the labels for each variable.
for (var in names(labels_table)) {
levels <- unique(roach_wide[[var]])
labels <- labels_table[[var]]
roach_wide[[var]] <- factor(roach_wide[[var]], levels = levels, labels = labels)
}
Convert other variables to factors without adding labels.
other_vars <- c("hours_havail", "hours_lout", "run", "day", "trial")
roach_wide[other_vars] <- lapply(roach_wide[other_vars], as.factor)
Using the roach_wide_status dataframe consider variables with NA values.
roach_na <- roach_wide_status %>%
filter(q_na > 0) %>%
arrange(-p_na) %>%
select(variable, q_na, p_na)
| variable | q_na | p_na |
|---|---|---|
| hours_havail | 3516 | 50.65 |
| hours_lout | 2334 | 33.62 |
Variables with NA not considered in main analysis, NAs can be omitted for visual analysis.
Using the roach_wide_status dataframe consider the spread of zeros in habitat count variables
roach_0 <- roach_wide_status %>%
filter(variable %in% c("c_hab", "c_ps", "c_open", "c_open_cent", "c_open_screen", "c_open_hab")) %>%
arrange(-p_zeros) %>%
select(variable, q_zeros, p_zeros)
| variable | q_zeros | p_zeros |
|---|---|---|
| c_open_screen | 5683 | 81.86 |
| c_open_cent | 5331 | 76.79 |
| c_open_hab | 4510 | 64.97 |
| c_open | 3732 | 53.76 |
| c_ps | 3724 | 53.64 |
| c_hab | 3201 | 46.11 |
High proportion of 0’s in all counts, but will be confounded without consideration for light period and habitat availability. Regardless, the presence of high 0’s will significantly skew variability and make raw counts hard to interpret. Descriptive statistics would be limited by high IQR and misleading min/max values. Consider rescaling count variables for analysis.
Check the main dataframe for outliers.
dlookr::plot_outlier - for each variable specified the function plots outlier information for numerical data diagnosis
dlookr::plot_outlier(roach_wide, "c_hab", "c_ps", "c_open")
Check the main dataframe for data distribution.
dlookr::plot_normality - for each variable specified the function determines normality by plotting histogram and q-q plot of the original data, log transformed, and square root transformed.
dlookr::plot_normality(roach_wide, "c_hab", "c_ps", "c_open")
The code performs a two-step transformation: it first normalizes raw fish count data to a 0-1 scale, and then ensures that the normalized values have a small positive offset to prevent division by zero issues for modelling.
roach_wide <- roach_wide %>%
mutate(across(c(c_hab, c_ps, c_open), ~ . / max(.), .names = "{.col}_normalized"),
across(ends_with("_normalized"), ~ ifelse(. == 0, . + 0.0000000001, .)))
First, create minimal dataset with variables of interest.
roach_wide_sum = select(roach_wide, c_hab, c_ps, c_open, c_hab_normalized, c_ps_normalized, c_open_normalized, sequence, light, treatment)
funModeling::profile_num - Provides an expanded summary including mean, standard deviation, skewness and kurtosis.
Skewness >0 = right-skew, <0 = left-skew, 0 = symmetric. kurtosis >3 = sharp, <3 = flat, 3 = normal
numeric_prof <-funModeling::profiling_num(roach_wide_sum)
| variable | mean | std_dev | variation_coef | p_01 | p_05 | p_25 | p_50 | p_75 | p_95 | p_99 | skewness | kurtosis | iqr | range_98 | range_80 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| c_hab | 4.1780467 | 4.7238987 | 1.130648 | 0 | 0 | 0 | 2.0000000 | 8.0000000 | 12 | 12 | 0.6180048 | 1.782165 | 8.0000000 | [0, 12] | [0, 12] |
| c_ps | 4.3388073 | 5.4305629 | 1.251626 | 0 | 0 | 0 | 0.0000000 | 12.0000000 | 12 | 12 | 0.6121211 | 1.476341 | 12.0000000 | [0, 12] | [0, 12] |
| c_open | 3.4462691 | 4.3204959 | 1.253673 | 0 | 0 | 0 | 0.0000000 | 7.0000000 | 12 | 12 | 0.8004510 | 2.112379 | 7.0000000 | [0, 12] | [0, 11] |
| c_hab_normalized | 0.3481706 | 0.3936582 | 1.130648 | 0 | 0 | 0 | 0.1666667 | 0.6666667 | 1 | 1 | 0.6180048 | 1.782165 | 0.6666667 | [1e-10, 1] | [1e-10, 1] |
| c_ps_normalized | 0.3615673 | 0.4525469 | 1.251626 | 0 | 0 | 0 | 0.0000000 | 1.0000000 | 1 | 1 | 0.6121211 | 1.476341 | 1.0000000 | [1e-10, 1] | [1e-10, 1] |
| c_open_normalized | 0.2871891 | 0.3600413 | 1.253673 | 0 | 0 | 0 | 0.0000000 | 0.5833333 | 1 | 1 | 0.8004510 | 2.112379 | 0.5833333 | [1e-10, 1] | [1e-10, 0.916666666666667] |
Standard deviations of raw counts are high, as expected from normality plots.
Examine raw count descriptive summary before considering rescaled values.
count_sum <- bind_rows(
roach_wide_sum %>%
group_by(light) %>%
summarise(sum_c_hab = sum(c_hab),
sum_c_ps = sum(c_ps),
sum_c_open = sum(c_open)) %>%
rename(variable = light),
roach_wide_sum %>%
group_by(sequence) %>%
summarise(sum_c_hab = sum(c_hab),
sum_c_ps = sum(c_ps),
sum_c_open = sum(c_open)) %>%
rename(variable = sequence),
roach_wide_sum %>%
summarise(sum_c_hab = sum(c_hab),
sum_c_ps = sum(c_ps),
sum_c_open = sum(c_open),
variable = "Total") %>%
mutate(variable = factor(variable)))
| variable | sum_c_hab | sum_c_ps | sum_c_open |
|---|---|---|---|
| Light period | |||
| Day | 15011 | 13426 | 2790 |
| Night | 13993 | 16694 | 21134 |
| Experimental sequence | |||
| Baseline | 0 | 15014 | 6283 |
| I 1 | 4483 | 13129 | 3041 |
| I 2 | 13272 | 0 | 7327 |
| I 3 | 11249 | 1977 | 7273 |
| Total count | |||
| Total | 29004 | 30120 | 23924 |
Examine rescaled counts
rescaled_mean <- bind_rows(
roach_wide_sum %>%
group_by(light) %>%
summarise(mean_c_hab = mean(c_hab_normalized),
mean_c_ps = mean(c_ps_normalized),
mean_c_open = mean(c_open_normalized)) %>%
rename(variable = light),
roach_wide_sum %>%
group_by(sequence) %>%
summarise(mean_c_hab = mean(c_hab_normalized),
mean_c_ps = mean(c_ps_normalized),
mean_c_open = mean(c_open_normalized)) %>%
rename(variable = sequence),
roach_wide_sum %>% filter(light=="Day")%>%
group_by(treatment) %>%
summarise(mean_c_hab = mean(c_hab_normalized),
mean_c_ps = mean(c_ps_normalized),
mean_c_open = mean(c_open_normalized)) %>%
rename(variable = treatment))
| variable | mean_c_hab | mean_c_ps | mean_c_open |
|---|---|---|---|
| Light period | |||
| Day | 0.4781792 | 0.4276886 | 0.0888761 |
| Night | 0.2695523 | 0.3215827 | 0.4071120 |
| Experimental sequence | |||
| Baseline | 0.0000000 | 0.7044857 | 0.2948104 |
| I 1 | 0.2161941 | 0.6331501 | 0.1466532 |
| I 2 | 0.6400463 | 0.0000000 | 0.3533468 |
| I 3 | 0.5481969 | 0.0963450 | 0.3544347 |
| Treatment (daytime) | |||
| Covered (B) | 0.4968782 | 0.4083206 | 0.0899592 |
| Uncovered (A) | 0.4594801 | 0.4470566 | 0.0877931 |
Apply custom themeing for ggplot2 objects.
theme_JN <- function(base_size=12){
theme_grey() %+replace%
theme(
axis.text = element_text(colour="black"),
axis.title = element_text(colour="black"),
axis.ticks = element_line(colour="black"),
panel.border = element_rect(colour = "black", fill=NA),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
strip.background = element_rect(colour = "black",fill = NA),
panel.spacing.x = unit(12, "pt")
)
}
Plot raw count data first to confirm suitability of rescaling data.
ggplot(roach_wide_sum, aes(x = sequence, y= c_ps)) +
geom_boxplot() +theme_JN()+theme(text=element_text(size=20))
ggplot(roach_wide_sum, aes(x = sequence, y= c_hab)) +
geom_boxplot()+theme_JN()+theme(text=element_text(size=20))
ggplot(roach_wide_sum, aes(x = sequence, y= c_open)) +
geom_boxplot()+theme_JN()+theme(text=element_text(size=20))
The boxplots confirm raw count data are unsuitable for analysis due to large variation in counts presenting large IQR + outliers. Descriptive data e.g., medians are therefore hard to interpret and generalise.
From here on all analysis considers rescaled counts.
#Artificial habitat,light
ggplot(roach_wide_sum %>% filter(sequence!="Baseline")%>%
group_by(light) %>%
summarise(mean_c_hab = mean(c_hab_normalized),
se_c_hab = sd(c_hab_normalized) / sqrt(n())),
aes(x = light, y = mean_c_hab)) +
geom_point(size = 3) +
geom_errorbar(aes(ymin = mean_c_hab - se_c_hab, ymax = mean_c_hab + se_c_hab),
width = 0.2)+theme_JN()+theme(text=element_text(size=20))
#pumping station, light
ggplot(roach_wide_sum %>% filter(sequence!="I 2")%>%
group_by(light) %>%
summarise(mean_c_hab = mean(c_ps_normalized),
se_c_hab = sd(c_ps_normalized) / sqrt(n())),
aes(x = light, y = mean_c_hab)) +
geom_point(size = 3) +
geom_errorbar(aes(ymin = mean_c_hab - se_c_hab, ymax = mean_c_hab + se_c_hab),
width = 0.2)+theme_JN()+theme(text=element_text(size=20))
#open water, light
ggplot(roach_wide_sum %>%
group_by(light) %>%
summarise(mean_c_hab = mean(c_open_normalized),
se_c_hab = sd(c_open_normalized) / sqrt(n())),
aes(x = light, y = mean_c_hab)) +
geom_point(size = 3) +
geom_errorbar(aes(ymin = mean_c_hab - se_c_hab, ymax = mean_c_hab + se_c_hab),
width = 0.2)+theme_JN()+theme(text=element_text(size=20))
#Artificial habitat, sequence, light
ggplot(roach_wide_sum %>% filter(sequence!="Baseline")%>%
group_by(sequence, light) %>%
summarise(mean_c_hab = mean(c_hab_normalized),
se_c_hab = sd(c_hab_normalized) / sqrt(n())),
aes(x = sequence, y = mean_c_hab, colour = light)) +
geom_point(size = 3) +
geom_errorbar(aes(ymin = mean_c_hab - se_c_hab, ymax = mean_c_hab + se_c_hab),
width = 0.2)+theme_JN()+theme(text=element_text(size=20))
#pumping station, sequence, light
ggplot(roach_wide_sum %>% filter(sequence!="I 2")%>%
group_by(sequence, light) %>%
summarise(mean_c_hab = mean(c_ps_normalized),
se_c_hab = sd(c_ps_normalized) / sqrt(n())),
aes(x = sequence, y = mean_c_hab, colour = light)) +
geom_point(size = 3) +
geom_errorbar(aes(ymin = mean_c_hab - se_c_hab, ymax = mean_c_hab + se_c_hab),
width = 0.2)+theme_JN()+theme(text=element_text(size=20))
#open water, sequence, light
ggplot(roach_wide_sum %>%
group_by(sequence, light) %>%
summarise(mean_c_hab = mean(c_open_normalized),
se_c_hab = sd(c_open_normalized) / sqrt(n())),
aes(x = sequence, y = mean_c_hab, colour = light)) +
geom_point(size = 3) +
geom_errorbar(aes(ymin = mean_c_hab - se_c_hab, ymax = mean_c_hab + se_c_hab),
width = 0.2)+theme_JN()+theme(text=element_text(size=20))
By plotting the main visual relationships we’re able to quickly identify relationships in the data and begin addressing study objectives and hypothesis.
Here, we can consider that the day/night relationship in habitat occupancy was relatively fixed throughout the experiment in all habitat options. We see that open water occupancy was decreased in intervention 1, possibly attributed to introducing artificial habitat.
To consider H5, let’s examine treatment effects.
#Artificial habitat, sequence, treatment
ggplot(roach_wide_sum %>% filter(sequence!="Baseline")%>%
group_by(sequence, treatment) %>%
summarise(mean_c_hab = mean(c_hab_normalized),
se_c_hab = sd(c_hab_normalized) / sqrt(n())),
aes(x = sequence, y = mean_c_hab, colour = treatment)) +
geom_point(size = 3) +
geom_errorbar(aes(ymin = mean_c_hab - se_c_hab, ymax = mean_c_hab + se_c_hab),
width = 0.2)+theme_JN()+theme(text=element_text(size=20))
#pumping station, sequence, treatment
ggplot(roach_wide_sum %>% filter(sequence!="I 2")%>%
group_by(sequence, treatment) %>%
summarise(mean_c_hab = mean(c_ps_normalized),
se_c_hab = sd(c_ps_normalized) / sqrt(n())),
aes(x = sequence, y = mean_c_hab, colour = treatment)) +
geom_point(size = 3) +
geom_errorbar(aes(ymin = mean_c_hab - se_c_hab, ymax = mean_c_hab + se_c_hab),
width = 0.2)+theme_JN()+theme(text=element_text(size=20))
#open water, sequence, treatment
ggplot(roach_wide_sum %>%
group_by(sequence, treatment) %>%
summarise(mean_c_hab = mean(c_open_normalized),
se_c_hab = sd(c_open_normalized) / sqrt(n())),
aes(x = sequence, y = mean_c_hab, colour = treatment)) +
geom_point(size = 3) +
geom_errorbar(aes(ymin = mean_c_hab - se_c_hab, ymax = mean_c_hab + se_c_hab),
width = 0.2)+theme_JN()+theme(text=element_text(size=20))
#Artificial habitat, treatment, light
ggplot(roach_wide_sum %>% filter(sequence!="Baseline")%>%
group_by(treatment, light) %>%
summarise(mean_c_hab = mean(c_hab_normalized),
se_c_hab = sd(c_hab_normalized) / sqrt(n())),
aes(x = treatment, y = mean_c_hab, colour = light)) +
geom_point(size = 3) +
geom_errorbar(aes(ymin = mean_c_hab - se_c_hab, ymax = mean_c_hab + se_c_hab),
width = 0.2)+theme_JN()+theme(text=element_text(size=20))
#daytime counts highest in covered treatments
#pumping station, treatment, light
ggplot(roach_wide_sum %>% filter(sequence!="I 2")%>%
group_by(treatment, light) %>%
summarise(mean_c_hab = mean(c_ps_normalized),
se_c_hab = sd(c_ps_normalized) / sqrt(n())),
aes(x = treatment, y = mean_c_hab, colour = light)) +
geom_point(size = 3) +
geom_errorbar(aes(ymin = mean_c_hab - se_c_hab, ymax = mean_c_hab + se_c_hab),
width = 0.2)+theme_JN()+theme(text=element_text(size=20))
#open water, treatment, light
ggplot(roach_wide_sum %>%
group_by(treatment, light) %>%
summarise(mean_c_hab = mean(c_open_normalized),
se_c_hab = sd(c_open_normalized) / sqrt(n())),
aes(x = treatment, y = mean_c_hab, colour = light)) +
geom_point(size = 3) +
geom_errorbar(aes(ymin = mean_c_hab - se_c_hab, ymax = mean_c_hab + se_c_hab),
width = 0.2)+theme_JN()+theme(text=element_text(size=20))
#Artificial habitat, treatment (daytime)
ggplot(roach_wide_sum %>% filter(sequence!="Baseline")%>% filter(light=="Day")%>%
group_by(treatment, light) %>%
summarise(mean_c_hab = mean(c_hab_normalized),
se_c_hab = sd(c_hab_normalized) / sqrt(n())),
aes(x = treatment, y = mean_c_hab, colour = light)) +
geom_point(size = 3) +
geom_errorbar(aes(ymin = mean_c_hab - se_c_hab, ymax = mean_c_hab + se_c_hab),
width = 0.2)+theme_JN()+theme(text=element_text(size=20))
#Artificial habitat, sequence, treatment (daytime)
ggplot(roach_wide_sum %>% filter(sequence!="Baseline")%>% filter(light=="Day")%>%
group_by(sequence, treatment) %>%
summarise(mean_c_hab = mean(c_hab_normalized),
se_c_hab = sd(c_hab_normalized) / sqrt(n())),
aes(x = sequence, y = mean_c_hab, colour = treatment)) +
geom_point(size = 3) +
geom_errorbar(aes(ymin = mean_c_hab - se_c_hab, ymax = mean_c_hab + se_c_hab),
width = 0.2)+theme_JN()+theme(text=element_text(size=20))
In these plots we see that treatment has an impact on habitat occupancy and in all sequences, habitat occupancy in artificial habitat was highest in uncovered treatments.Interestingly, occupancy in pumping station shows the opposite relationship to artificial habitat. Pumping station occupancy was highest in uncovered treatments, suggesting that cover is important i.e., when uncovered habitat is provided, more fish occupy the pumping station. Overall, we found support for H5.
Before continuing with statistical analysis let’s examine repeated measures and temporal effects.
#Artificial habitat, trial (daytime)
ggplot(roach_wide %>% filter(sequence!="Baseline")%>% filter(light=="Day")%>%
group_by(trial) %>%
summarise(mean_c_hab = mean(c_hab_normalized),
se_c_hab = sd(c_hab_normalized) / sqrt(n())),
aes(x = trial, y = mean_c_hab)) +
geom_point(size = 3) +
geom_errorbar(aes(ymin = mean_c_hab - se_c_hab, ymax = mean_c_hab + se_c_hab),
width = 0.2)+theme_JN()+theme(text=element_text(size=20))
The variation between trials suggests significant effect of within-subject observations (i.e., repeated measures). Additionally, there appears to be temporal influence (ex. trial 1 compared to trial 24)
ggplot(roach_wide %>% filter(sequence!="Baseline")%>%
group_by(time) %>%
summarise(mean_c_hab = mean(c_hab_normalized),
se_c_hab = sd(c_hab_normalized) / sqrt(n())),
aes(x = time, y = mean_c_hab)) +
geom_point(size = 3) +
geom_errorbar(aes(ymin = mean_c_hab - se_c_hab, ymax = mean_c_hab + se_c_hab),
width = 0.2)+theme_JN()+theme(text=element_text(size=20))
Evidence for hour-to-hour variation. Ensure is included in analysis.
Finally, considering further relationships not required for main objectives.
ggplot(roach_wide %>%
filter(!is.na(hours_havail))%>% filter(sequence =="I 1")%>%
group_by(hours_havail) %>%
summarise(mean_c_hab = mean(c_hab_normalized),
se_c_hab = sd(c_hab_normalized) / sqrt(n())),
aes(x = hours_havail, y = mean_c_hab)) +
geom_point(size = 3) +
geom_errorbar(aes(ymin = mean_c_hab - se_c_hab, ymax = mean_c_hab + se_c_hab),
width = 0.2)+theme_JN()+theme(text=element_text(size=20))
roach_wide %>%
filter(!is.na(hours_havail)) %>%
filter(sequence == "I 1") %>%
mutate(hours = as.numeric(hours_havail)) %>%
with(cor.test(hours, c_hab_normalized))
##
## Pearson's product-moment correlation
##
## data: hours and c_hab_normalized
## t = 4.8211, df = 1702, p-value = 1.555e-06
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.06896398 0.16266011
## sample estimates:
## cor
## 0.1160703
There is no real relationship. Occupancy peaks 24h after introduction, as expected due to nocturnal behavior. Correlation testing shows a weak positive correlation, but significant.
ggplot(roach_wide %>%
filter(!is.na(hours_lout))%>%
group_by(hours_lout) %>%
summarise(mean_c_hab = mean(c_open_normalized),
se_c_hab = sd(c_open_normalized) / sqrt(n())),
aes(x = hours_lout, y = mean_c_hab)) +
geom_point(size = 3) +
geom_errorbar(aes(ymin = mean_c_hab - se_c_hab, ymax = mean_c_hab + se_c_hab),
width = 0.2)+theme_JN()+theme(text=element_text(size=20))
roach_wide %>%
filter(!is.na(hours_lout)) %>%
mutate(hours = as.numeric(hours_lout)) %>%
with(cor.test(hours, c_open_normalized))
##
## Pearson's product-moment correlation
##
## data: hours and c_open_normalized
## t = 9.4592, df = 4606, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1096056 0.1662548
## sample estimates:
## cor
## 0.1380431
Again, no strong relationship. Occupancy peaks within 1h of lights out. Correlation testing shows a weak positive correlation, but significant.
Ordinarily we may now choose to perform group comparisons etc on our variables of interest to support the relationships shown in the visual analysis. However, the nature of the problem is complex and thus is best treated by a modelling approach.
Data exploration has shown:
table(roach_wide$sequence)
##
## Baseline I 1 I 2 I 3
## 1776 1728 1728 1710
table(roach_wide$treatment)
##
## Covered (B) Uncovered (A)
## 3471 3471
table(roach_wide$light)
##
## Day Night
## 2616 4326
The data is well-balanced across the grouping variables except light, which is expected due to day night differences.
Let’s extract hour from the time variable and store as factor for random effect modelling.
roach_wide <- roach_wide %>%
mutate(time_factor = factor(as.numeric(format(strptime(time, format = "%H:%M:%S"), "%H"))))
Create a new dataframe for binary modelling later.
#Create new DF for binary model
roach_binary <- roach_wide %>%filter (sequence=="I 1"|sequence=="I 3")%>%filter(light=="Day")
roach_binary$binary <- ifelse(roach_binary$sequence =="I 1", 0,1)
Start by creating a null (intercept only) model.
mod1 <- glmmTMB(c_hab_normalized ~ 1, data = roach_wide%>%filter(sequence!="Baseline"),
family = gaussian(link="log"))
summary(mod1)
## Family: gaussian ( log )
## Formula: c_hab_normalized ~ 1
## Data: roach_wide %>% filter(sequence != "Baseline")
##
## AIC BIC logLik deviance df.resid
## 4939.5 4952.6 -2467.8 4935.5 5164
##
##
## Dispersion estimate for gaussian family (sigma^2): 0.152
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.7596 0.0116 -65.47 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Intrepting the null model is not important here, but we can see that the estimated mean of our normalised count variable is significantly different from 0. We will use this null model for determining model fit by comparing subsequent models to the null using AIC selection.
Let’s add our fixed effects of interest.
mod1.1 <- update(mod1, c_hab_normalized ~ sequence + light + treatment)
summary(mod1.1)
## Family: gaussian ( log )
## Formula: c_hab_normalized ~ sequence + light + treatment
## Data: roach_wide %>% filter(sequence != "Baseline")
##
## AIC BIC logLik deviance df.resid
## 2422.0 2461.3 -1205.0 2410.0 5160
##
##
## Dispersion estimate for gaussian family (sigma^2): 0.0934
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.20486 0.03538 -34.05 < 2e-16 ***
## sequenceI 2 1.12453 0.03607 31.18 < 2e-16 ***
## sequenceI 3 1.02292 0.03656 27.98 < 2e-16 ***
## lightNight -0.62328 0.01728 -36.07 < 2e-16 ***
## treatmentUncovered (A) -0.08459 0.01619 -5.22 1.76e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The model outputs are sensible and have reduced the AIC of the null model from 4939 to 2422. All our predictor terms are significant with sensible coefficent and error estimates. For example, we can see that the mean value of c_hab_normalized is reduced by -0.623 units during night time light conditions.
It can be useful to plot model predictions early whilst building a complete model to understand how adding independent variables affects the estimated response.
plot(ggpredict(mod1.1, terms = c("sequence","light")))+theme_JN()+theme(text=element_text(size=20))
Here, we can see that the predicted habitat occupancy response is sensible, and close to the observed values we plotted earlier.
Importantly, random effects need to be added to account for repeated measures and temporal dependency.
mod1.2 <- update(mod1.1, . ~ . + (1 | time_factor/day/trial))
summary(mod1.2)
## Family: gaussian ( log )
## Formula:
## c_hab_normalized ~ sequence + light + treatment + (1 | time_factor/day/trial)
## Data: roach_wide %>% filter(sequence != "Baseline")
##
## AIC BIC logLik deviance df.resid
## 2401.6 2460.5 -1191.8 2383.6 5157
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## trial:day:time_factor (Intercept) 1.001e-10 0.00001
## day:time_factor (Intercept) 7.687e-03 0.08767
## time_factor (Intercept) 9.906e-04 0.03147
## Residual 9.123e-02 0.30204
## Number of obs: 5166, groups:
## trial:day:time_factor, 5166; day:time_factor, 216; time_factor, 24
##
## Dispersion estimate for gaussian family (sigma^2): 0.0912
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.19015 0.03873 -30.731 < 2e-16 ***
## sequenceI 2 1.12143 0.03946 28.418 < 2e-16 ***
## sequenceI 3 0.98280 0.04094 24.007 < 2e-16 ***
## lightNight -0.62157 0.02527 -24.600 < 2e-16 ***
## treatmentUncovered (A) -0.08580 0.01598 -5.369 7.9e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(ggpredict(mod1.2, terms = c("sequence","light")))+theme_JN()+theme(text=element_text(size=20))
As before, the model predictions are improved and the within-subject variation has been handled by adding the random effect of trial.
We can check the goodness of fit by plotting residuals using DHARMA.
fittedmod1.2 <- mod1.2
simuout1 <- simulateResiduals(fittedModel = fittedmod1.2)
plot(simuout1, quantreg = T)
The residuals generally follow a linear relationship and he deviation is accepted within a generalised framework. The quantile deviations present are likely a result of dispersion due to clumped values close to 0 and 1.
The model process is now repeated for the remaining response variables; pumping station and open water.
#Null model
mod2 <- glmmTMB(c_ps_normalized ~ 1, data = roach_wide%>%filter(sequence!="I 2"),
family = gaussian(link="log"))
summary(mod2)
## Family: gaussian ( log )
## Formula: c_ps_normalized ~ 1
## Data: roach_wide %>% filter(sequence != "I 2")
##
## AIC BIC logLik deviance df.resid
## 6784.9 6798.0 -3390.4 6780.9 5212
##
##
## Dispersion estimate for gaussian family (sigma^2): 0.215
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.73106 0.01334 -54.81 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Add fixed effects
mod2.1 <- update(mod2, c_ps_normalized ~ sequence + light + treatment)
summary(mod2.1)
## Family: gaussian ( log )
## Formula: c_ps_normalized ~ sequence + light + treatment
## Data: roach_wide %>% filter(sequence != "I 2")
##
## AIC BIC logLik deviance df.resid
## 4372.6 4411.9 -2180.3 4360.6 5208
##
##
## Dispersion estimate for gaussian family (sigma^2): 0.135
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.20719 0.01776 -11.665 < 2e-16 ***
## sequenceI 1 -0.10587 0.01847 -5.733 9.88e-09 ***
## sequenceI 3 -2.01448 0.09481 -21.248 < 2e-16 ***
## lightNight -0.29107 0.01827 -15.932 < 2e-16 ***
## treatmentUncovered (A) 0.05181 0.01827 2.836 0.00457 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Add random effects
mod2.2 <- update(mod2.1, . ~ . + (1 | time_factor/day/trial))
summary(mod2.2)
## Family: gaussian ( log )
## Formula:
## c_ps_normalized ~ sequence + light + treatment + (1 | time_factor/day/trial)
## Data: roach_wide %>% filter(sequence != "I 2")
##
## AIC BIC logLik deviance df.resid
## 4218.1 4277.1 -2100.1 4200.1 5205
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## trial:day:time_factor (Intercept) 9.025e-02 3.004e-01
## day:time_factor (Intercept) 2.876e-11 5.363e-06
## time_factor (Intercept) 3.499e-13 5.915e-07
## Residual 1.061e-01 3.257e-01
## Number of obs: 5214, groups:
## trial:day:time_factor, 5214; day:time_factor, 218; time_factor, 24
##
## Dispersion estimate for gaussian family (sigma^2): 0.106
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.26812 0.01981 -13.536 < 2e-16 ***
## sequenceI 1 -0.09148 0.01932 -4.736 2.18e-06 ***
## sequenceI 3 -2.05188 0.08633 -23.768 < 2e-16 ***
## lightNight -0.25369 0.01920 -13.212 < 2e-16 ***
## treatmentUncovered (A) 0.04263 0.01915 2.227 0.026 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(ggpredict(mod2.2, terms = c("sequence","light")))+theme_JN()+theme(text=element_text(size=20))
#Null model
mod3 <- glmmTMB(c_open_normalized ~ 1, data = roach_wide,
family = gaussian(link="log"))
summary(mod3)
## Family: gaussian ( log )
## Formula: c_open_normalized ~ 1
## Data: roach_wide
##
## AIC BIC logLik deviance df.resid
## 5520.5 5534.2 -2758.3 5516.5 6940
##
##
## Dispersion estimate for gaussian family (sigma^2): 0.13
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.24761 0.01505 -82.92 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Add fixed effects
mod3.1 <- update(mod3, c_open_normalized ~ sequence + light + treatment)
summary(mod3.1)
## Family: gaussian ( log )
## Formula: c_open_normalized ~ sequence + light + treatment
## Data: roach_wide
##
## AIC BIC logLik deviance df.resid
## 3544.6 3592.6 -1765.3 3530.6 6935
##
##
## Dispersion estimate for gaussian family (sigma^2): 0.0974
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.47386 0.07071 -34.99 < 2e-16 ***
## sequenceI 1 -0.54735 0.04770 -11.48 < 2e-16 ***
## sequenceI 2 0.21837 0.03055 7.15 8.86e-13 ***
## sequenceI 3 0.28354 0.02989 9.49 < 2e-16 ***
## lightNight 1.49549 0.06583 22.72 < 2e-16 ***
## treatmentUncovered (A) 0.08195 0.02220 3.69 0.000223 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Add random effects
mod3.2 <- update(mod3.1, . ~ . + (1 | time_factor/day/trial))
summary(mod3.2)
## Family: gaussian ( log )
## Formula:
## c_open_normalized ~ sequence + light + treatment + (1 | time_factor/day/trial)
## Data: roach_wide
##
## AIC BIC logLik deviance df.resid
## 3162.7 3231.2 -1571.4 3142.7 6932
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## trial:day:time_factor (Intercept) 3.340e-01 5.780e-01
## day:time_factor (Intercept) 2.883e-11 5.369e-06
## time_factor (Intercept) 9.907e-16 3.147e-08
## Residual 5.889e-02 2.427e-01
## Number of obs: 6942, groups:
## trial:day:time_factor, 6942; day:time_factor, 290; time_factor, 24
##
## Dispersion estimate for gaussian family (sigma^2): 0.0589
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.39186 0.05561 -43.01 < 2e-16 ***
## sequenceI 1 -0.65043 0.04127 -15.76 < 2e-16 ***
## sequenceI 2 0.09886 0.03416 2.89 0.003804 **
## sequenceI 3 0.12975 0.03364 3.86 0.000115 ***
## lightNight 1.38818 0.04682 29.65 < 2e-16 ***
## treatmentUncovered (A) 0.03996 0.02421 1.65 0.098784 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(ggpredict(mod3.2, terms = c("sequence","light")))+theme_JN()+theme(text=element_text(size=20))
To separately analyse the effect of habitat exclusion on overall probability of fish occupying a given habitat post-exclusion (sequence: I 3) let’s build a pair of GLMMs in glmmTMB with binomial distribution (link = logit). In these models, fixed effects are limited to experimental sequence (I 1, I 3) and used the same random effects specified in the main models.
mod_binary_ah <- glmmTMB(c_hab_normalized ~ as.factor(binary) + (1 | time_factor/day/trial), data = roach_binary, family = binomial())
summary(mod_binary_ah)
## Family: binomial ( logit )
## Formula:
## c_hab_normalized ~ as.factor(binary) + (1 | time_factor/day/trial)
## Data: roach_binary
##
## AIC BIC logLik deviance df.resid
## 1258.3 1284.1 -624.2 1248.3 1273
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## trial:day:time_factor (Intercept) 1.903e-09 4.363e-05
## day:time_factor (Intercept) 6.201e-02 2.490e-01
## time_factor (Intercept) 7.335e-28 2.708e-14
## Number of obs: 1278, groups:
## trial:day:time_factor, 1278; day:time_factor, 60; time_factor, 10
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.1126 0.1031 -10.79 <2e-16 ***
## as.factor(binary)1 2.9091 0.1643 17.71 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The model predicts a significant increase in the probability of fish occupying artificial habitat post-exclusion.
plot(ggpredict(mod_binary_ah, terms = c("binary")))+theme_JN()+theme(text=element_text(size=20))
mod_binary_ps <- glmmTMB(c_ps_normalized ~ as.factor(binary) + (1 | time_factor/day/trial), data = roach_binary, family = binomial())
summary(mod_binary_ps)
## Family: binomial ( logit )
## Formula:
## c_ps_normalized ~ as.factor(binary) + (1 | time_factor/day/trial)
## Data: roach_binary
##
## AIC BIC logLik deviance df.resid
## 1127.6 1153.3 -558.8 1117.6 1273
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## trial:day:time_factor (Intercept) 2.877e-09 5.364e-05
## day:time_factor (Intercept) 4.603e-12 2.145e-06
## time_factor (Intercept) 1.232e-20 1.110e-10
## Number of obs: 1278, groups:
## trial:day:time_factor, 1278; day:time_factor, 60; time_factor, 10
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.05317 0.08971 11.74 <2e-16 ***
## as.factor(binary)1 -3.38536 0.16652 -20.33 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The model predicts a significant decrease in the probability of fish occupying pumping station post-exclusion.
plot(ggpredict(mod_binary_ps, terms = c("binary")))+theme_JN()+theme(text=element_text(size=20))
We will save the individual model predictions using ggpredict and then combine and restructure the dataframe using dplyr from the tidyverse.
c_hab_glmmm <-ggpredict(mod1.2, terms = c("sequence", "light"))
c_ps_glmmm <-ggpredict(mod2.2, terms = c("sequence", "light"))
c_open_glmmm <-ggpredict(mod3.2, terms = c("sequence", "light"))
c_hab_glmmm_treat <-ggpredict(mod1.2, terms = c("sequence", "treatment", "light[Day]"))
Next, a grouping variable for each habitat type is added. A second variable ‘present’ is added to indicate if data is present for the experimental sequence.
c_hab_glmmm <- c_hab_glmmm %>% mutate(habitat = "Artificial habitat", present = "T")
c_ps_glmmm <- c_ps_glmmm %>% mutate(habitat = "Pumping station", present = "T")
c_open_glmmm <- c_open_glmmm %>% mutate(habitat = "Open water", present = "T")
The dataframes are then combined and restructured for plotting. First, the rows are bound then NA values are omitted and dataframe attributes are removed (for simplicity, not necessity). Dummy rows are added for the sequence groups without data, and variables are then converted to factors with specific levels.
modeloutput_df <- bind_rows(c_hab_glmmm, c_ps_glmmm, c_open_glmmm)
modeloutput_df <- na.omit(modeloutput_df)
modeloutput_df <- as.data.frame(unclass(modeloutput_df))
modeloutput_df <- modeloutput_df %>%
add_row(x = 'Baseline', group = 'Day', habitat = 'Artificial habitat', present = "F") %>%
add_row(x = 'Baseline', group = 'Night', habitat = 'Artificial habitat', present = "F") %>%
add_row(x = 'I 2', group = 'Day', habitat = 'Pumping station', present = "F") %>%
add_row(x = 'I 2', group = 'Night', habitat = 'Pumping station', present = "F")
modeloutput_df$x <- factor(modeloutput_df$x, levels = c('Baseline', 'I 1', 'I 2', 'I 3'))
modeloutput_df$habitat <- as.factor(modeloutput_df$habitat)
modeloutput_df$habitat <- factor(modeloutput_df$habitat, levels = c('Pumping station', 'Open water', 'Artificial habitat'))
| x | predicted | std.error | conf.low | conf.high | group | habitat | present |
|---|---|---|---|---|---|---|---|
| I 1 | 0.3041765 | 0.0387274 | 0.2819427 | 0.3281637 | Day | Artificial habitat | T |
| I 1 | 0.1633742 | 0.0409847 | 0.1507639 | 0.1770393 | Night | Artificial habitat | T |
| I 2 | 0.9335923 | 0.0222946 | 0.8936759 | 0.9752916 | Day | Artificial habitat | T |
| I 2 | 0.5014354 | 0.0227331 | 0.4795839 | 0.5242825 | Night | Artificial habitat | T |
| I 3 | 0.8127345 | 0.0240701 | 0.7752828 | 0.8519954 | Day | Artificial habitat | T |
| I 3 | 0.4365223 | 0.0238856 | 0.4165575 | 0.4574439 | Night | Artificial habitat | T |
| Baseline | 0.7648180 | 0.0198081 | 0.7356943 | 0.7950946 | Day | Pumping station | T |
| Baseline | 0.5934450 | 0.0191550 | 0.5715783 | 0.6161482 | Night | Pumping station | T |
| I 1 | 0.6979559 | 0.0205494 | 0.6704035 | 0.7266407 | Day | Pumping station | T |
| I 1 | 0.5415647 | 0.0197917 | 0.5209591 | 0.5629853 | Night | Pumping station | T |
| I 3 | 0.0982743 | 0.0863840 | 0.0829678 | 0.1164046 | Day | Pumping station | T |
| I 3 | 0.0762539 | 0.0867094 | 0.0643361 | 0.0903794 | Night | Pumping station | T |
| Baseline | 0.0914594 | 0.0556065 | 0.0820155 | 0.1019907 | Day | Open water | T |
| Baseline | 0.3665273 | 0.0286108 | 0.3465395 | 0.3876679 | Night | Open water | T |
| I 1 | 0.0477254 | 0.0596329 | 0.0424610 | 0.0536426 | Day | Open water | T |
| I 1 | 0.1912617 | 0.0373060 | 0.1777760 | 0.2057704 | Night | Open water | T |
| I 2 | 0.1009631 | 0.0519537 | 0.0911884 | 0.1117855 | Day | Open water | T |
| I 2 | 0.4046138 | 0.0272225 | 0.3835914 | 0.4267883 | Night | Open water | T |
| I 3 | 0.1041301 | 0.0506727 | 0.0942852 | 0.1150029 | Day | Open water | T |
| I 3 | 0.4173058 | 0.0265197 | 0.3961693 | 0.4395699 | Night | Open water | T |
| Baseline | NA | NA | NA | NA | Day | Artificial habitat | F |
| Baseline | NA | NA | NA | NA | Night | Artificial habitat | F |
| I 2 | NA | NA | NA | NA | Day | Pumping station | F |
| I 2 | NA | NA | NA | NA | Night | Pumping station | F |
A similar process follows for the treatment dataframe.
c_hab_glmmm_treat <- as.data.frame(unclass(c_hab_glmmm_treat))
c_hab_glmmm_treat <- c_hab_glmmm_treat %>% mutate(present = "T")
c_hab_glmmm_treat <- c_hab_glmmm_treat %>% select(-facet)
c_hab_glmmm_treat <- c_hab_glmmm_treat %>%
add_row(x = 'Baseline', group = 'Covered (B)', present = "F") %>%
add_row(x = 'Baseline', group = 'Uncovered (A)', present = "F")
c_hab_glmmm_treat$x <- factor(c_hab_glmmm_treat$x, levels = c('Baseline', 'I 1', 'I 2', 'I 3'))
c_hab_glmmm_treat$group <- factor(c_hab_glmmm_treat$group, levels = c('Uncovered (A)', 'Covered (B)'))
| x | predicted | std.error | conf.low | conf.high | group | present |
|---|---|---|---|---|---|---|
| I 1 | 0.3041765 | 0.0387274 | 0.2819427 | 0.3281637 | Covered (B) | T |
| I 1 | 0.2791664 | 0.0388745 | 0.2586861 | 0.3012682 | Uncovered (A) | T |
| I 2 | 0.9335923 | 0.0222946 | 0.8936759 | 0.9752916 | Covered (B) | T |
| I 2 | 0.8568302 | 0.0228311 | 0.8193338 | 0.8960425 | Uncovered (A) | T |
| I 3 | 0.8127345 | 0.0240701 | 0.7752828 | 0.8519954 | Covered (B) | T |
| I 3 | 0.7459096 | 0.0245659 | 0.7108463 | 0.7827025 | Uncovered (A) | T |
| Baseline | NA | NA | NA | NA | Covered (B) | F |
| Baseline | NA | NA | NA | NA | Uncovered (A) | F |
The binary model predictions are then saved. The grouping variable is modified to allow for easy plotting.
ah_prob <- ggpredict(mod_binary_ah, terms = c("binary"))
ps_prob <- ggpredict(mod_binary_ps, terms = c("binary"))
ps_prob$group <- 2
ps_prob$group <- factor(ps_prob$group)
habitat_prob_df <- bind_rows(ah_prob, ps_prob)
| x | predicted | std.error | conf.low | conf.high | group |
|---|---|---|---|---|---|
| 0 | 0.2473919 | 0.1030883 | 0.2117147 | 0.2868931 | 1 |
| 1 | 0.8577281 | 0.1261066 | 0.8248246 | 0.8853107 | 1 |
| 0 | 0.7413837 | 0.0897146 | 0.7062698 | 0.7736453 | 2 |
| 1 | 0.0884921 | 0.1402805 | 0.0686808 | 0.1133226 | 2 |
The final model outputs can now be plotted. The following figures are finalized versions with some edits made after exporting from R, thus the code snippet will not produce an identical figure.
ggplot(data=modeloutput_df, aes(x=x, y=predicted, fill=group))+
geom_tile(aes(x=x, y=0.5,height = Inf, fill=present), alpha = 0.3, show.legend = FALSE) +
scale_fill_manual(values = c("T" = "grey80", "F" = "grey90"), guide = FALSE) +
geom_errorbar(aes(ymin = conf.low, ymax = conf.high), size=0.5,width = 0.6 ,position = position_dodge(width = 0.7), show.legend = FALSE) +
#geom_path(aes(group = interaction(habitat, group), linetype = group), linewidth = 0.3 ,position = position_dodge(width = 0.7), show.legend = FALSE) +
scale_linetype_manual(values = c("Day" = "dashed", "Night" = "dotted"))+
geom_point(aes(shape=group),size=2, position = position_dodge(width = 0.7), show.legend = FALSE) +
scale_shape_manual(values = c("Day" = 20, "Night" = 4))+
scale_y_continuous(breaks = seq(0, 1,0.1), limits=c(0,1),expand=c(0.05,0)) +
scale_x_discrete(expand=c(0,0))+
labs(x = 'Experimental sequence',y= 'Habitat occupancy')+
theme_JN()+
theme(axis.text.x=element_text(size=8),
strip.background = element_rect(fill = "grey90"),
panel.spacing.x =unit(0, "lines") ) +
facet_grid(~ habitat, scales = "fixed")+
geom_text(data = modeloutput_df %>% filter(present == "F"), aes(x = x, y = 0.5, label = "Unavailable"), size = 8/.pt, angle = 90, fontface = "italic")
ggplot(data=c_hab_glmmm_treat, aes(x=x, y=predicted, fill=group))+
geom_tile(aes(x=x, y=0.5,height = Inf, fill=present), alpha = 0.3, show.legend = FALSE) +
scale_fill_manual(values = c("T" = NA, "F" = "grey90"), guide = FALSE) +
geom_errorbar(aes(ymin = conf.low, ymax = conf.high), size=0.5,width = 0.6 ,position = position_dodge(width = 0.7), show.legend = FALSE) +
geom_point(aes(shape=group),size=1.5, position = position_dodge(width = 0.7), show.legend = FALSE) +
scale_shape_manual(values = c("Covered (B)" = 20, "Uncovered (A)" = 4))+
scale_y_continuous(breaks = seq(0, 1,0.1), limits=c(0,1),expand=c(0.05,0)) +
scale_x_discrete(expand=c(0,0))+
labs(x = 'Experimental sequence',y= 'Daytime artifical habitat occupancy')+
coord_cartesian(clip="off")+
theme_JN()+
theme(axis.text.x=element_text(size=8)) +
geom_text(data = c_hab_glmmm_treat %>% filter(present == "F"), aes(x = x, y = 0.5, label = "Unavailable"), size = 8/.pt, angle = 90, fontface = "italic")
ggplot(data=habitat_prob_df, aes(x = factor(x, labels = c("Pre-exclusion", "Post-exclusion")), y=predicted, fill=group))+
geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width=.2) +
geom_line(aes(group=group,linetype = group), show.legend = FALSE)+
geom_point(aes(shape=group),size=2, show.legend = FALSE) +
scale_shape_manual(values = c("1" = 20, "2" = 4))+
scale_linetype_manual(values = c("1" = "dashed", "2" = "dotted"))+
scale_y_continuous(breaks = seq(0, 1, 0.1), limits = c(0, 1), expand = c(0.05, 0),
labels = scales::percent(seq(0, 1, 0.1), scale = 100)) +
scale_x_discrete()+
labs(x = 'Pumping station exclusion',y=expression(atop(NA, atop(textstyle('Predicted probabaility of'), textstyle('daytime habitat occupancy')))))+
coord_cartesian(clip="off")+
theme_JN()+
theme(axis.text.x=element_text(size=8))
Post-hoc analysis can now be performed to support the model outputs. Tests of relevance here include paired t.tests and repeated measures ANOVA.
#AH
anov1<- aov(c_hab_normalized ~ sequence + Error(trial), data = roach_wide%>%filter(sequence!="Baseline"))
summary(anov1)
##
## Error: trial
## Df Sum Sq Mean Sq F value Pr(>F)
## sequence 1 43.70 43.70 9.683 0.00508 **
## Residuals 22 99.29 4.51
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: Within
## Df Sum Sq Mean Sq F value Pr(>F)
## sequence 2 171.6 85.78 934.6 <2e-16 ***
## Residuals 5140 471.8 0.09
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#AH
anov2<- aov(c_ps_normalized ~ sequence + Error(trial), data = roach_wide%>%filter(sequence!="I 2"))
summary(anov2)
##
## Error: trial
## Df Sum Sq Mean Sq F value Pr(>F)
## sequence 1 126.0 125.97 13.15 0.00149 **
## Residuals 22 210.8 9.58
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: Within
## Df Sum Sq Mean Sq F value Pr(>F)
## sequence 2 380.5 190.24 2446 <2e-16 ***
## Residuals 5188 403.5 0.08
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#OW
anov3<- aov(c_open_normalized ~ sequence + Error(trial), data = roach_wide)
summary(anov3)
##
## Error: trial
## Df Sum Sq Mean Sq F value Pr(>F)
## sequence 1 15.24 15.239 3.013 0.0966 .
## Residuals 22 111.27 5.058
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: Within
## Df Sum Sq Mean Sq F value Pr(>F)
## sequence 3 49.5 16.487 157.5 <2e-16 ***
## Residuals 6915 723.8 0.105
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1